Expected scalarised returns dominance: a new solution concept for multi-objective decision making

نویسندگان

چکیده

Abstract In many real-world scenarios, the utility of a user is derived from single execution policy. this case, to apply multi-objective reinforcement learning, expected returns must be optimised. Various scenarios exist where user’s preferences over objectives (also known as function) are unknown or difficult specify. such set optimal policies learned. However, settings maximised have been largely overlooked by learning community and, consequence, solutions has yet defined. work, we propose first-order stochastic dominance criterion build solution sets maximise utility. We also define new criterion, scalarised (ESR) dominance, that extends allow learned in practice. Additionally, concept called ESR set, which dominant. Finally, present tabular distributional (MOTDRL) algorithm learn multi-armed bandit settings.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

solution of security constrained unit commitment problem by a new multi-objective optimization method

چکیده-پخش بار بهینه به عنوان یکی از ابزار زیر بنایی برای تحلیل سیستم های قدرت پیچیده ،برای مدت طولانی مورد بررسی قرار گرفته است.پخش بار بهینه توابع هدف یک سیستم قدرت از جمله تابع هزینه سوخت ،آلودگی ،تلفات را بهینه می کند،و هم زمان قیود سیستم قدرت را نیز برآورده می کند.در کلی ترین حالتopf یک مساله بهینه سازی غیر خطی ،غیر محدب،مقیاس بزرگ،و ایستا می باشد که می تواند شامل متغیرهای کنترلی پیوسته و گ...

Multi-Objective Decision Making

Many real-world tasks require making decisions that involve multiple possibly conflicting objectives. To succeed in such tasks, intelligent systems need planning or learning algorithms that can e ciently find di↵erent ways of balancing the trade-o↵s that such objectives present. In this tutorial, we provide an introduction to decision-theoretic approaches to coping with multiple objectives. We ...

متن کامل

A BI-LEVEL LINEAR MULTI-OBJECTIVE DECISION MAKING MODEL WITH INTERVAL COEFFICIENTS FOR SUPPLY CHAIN COORDINATION

  Bi-level programming, a tool for modeling decentralized decisions, consists of the objective(s) of the leader at its first level and that is of the follower at the second level. Three level programming results when second level is itself a bi-level programming. By extending this idea it is possible to define multi-level programs with any number of levels. Supply chain planning problems are co...

متن کامل

A Tabu Search Method for a New Bi-Objective Open Shop Scheduling Problem by a Fuzzy Multi-Objective Decision Making Approach (RESEARCH NOTE)

This paper proposes a novel, bi-objective mixed-integer mathematical programming for an open shop scheduling problem (OSSP) that minimizes the mean tardiness and the mean completion time. To obtain the efficient (Pareto-optimal) solutions, a fuzzy multi-objective decision making (fuzzy MODM) approach is applied. By the use of this approach, the related auxiliary single objective formulation can...

متن کامل

A bi-level linear multi-objective decision making model with interval coefficients for supply chain coordination

Abstract: Bi-level programming, a tool for modeling decentralized decisions, consists of the objective(s) of the leader at its first level and that is of the follower at the second level. Three level programming results when second level is itself a bi-level programming. By extending this idea it is possible to define multi-level programs with any number of levels. Supply chain planning problem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computing and Applications

سال: 2022

ISSN: ['0941-0643', '1433-3058']

DOI: https://doi.org/10.1007/s00521-022-07334-x